<html><head><title>The HarvestMan WebCrawler </title></head><body bgcolor="#FFFFFF">


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  <head>
    <base href="http://harvestmanontheweb.com"/>
    <title>The HarvestMan Webcrawler</title>
     <meta name="author" content="Anand B Pillai"/>
    <meta name="keywords" content="crawler, spider, bot, web-bot, robot, offline, browser, web, Internet, harvest, HarvestMan, HTTP, browsing, searching, Python, tools, aggregator, mining, intelligent, agents, agent-based computing,
     autonomous, documents"/>
    <meta name="description" content="Project page of the HarvestMan WebCrawler"/>
    <meta name="copyright" content="Anand B Pillai"/>
    <meta name="license" content="GNU General Public License, Copyright (C) 2004-2005, Anand B Pillai" />

    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
    <link href="style.css" rel="stylesheet" type="text/css"/>
    <link href="style.css" rel="stylesheet" type="text/css"/>

  </head>
 <body>
<div id="masthead"> 
  <h1 id="siteName"><img src="images/HarvestMan_s.jpg" alt="HarvestMan" align="absbottom"> - The HarvestMan Web Crawler</h1> 
</div> 

   <div id="top-navbar">
      <span class="navbar-title">HarvestMan</span>
      <span class="nbsep">:</span>

      <span class="navbar-item first-item"><a href="news.html">
            News</a> </span>

      <span class="nbsep">|</span>
      <span class="navbar-item">
            <a href="/">
            About</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item">
            <a href="releases.html">
            Releases</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item"><a href="http://harvestman-crawler.googlecode.com">
            Project page</a></span>
    
      <hr noshade>
      <span class="nbsep">|</span>
      <span class="navbar-item">
            <a href="faq.html">
            FAQ</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item"><a href="architecture.html">
            Architecture</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item"><a href="download.html">
            Downloads</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item">
            <a href="projects.html">
            Projects</a></span>

      <span class="nbsep">|</span>
      <span class="navbar-item">
            <a href="related.html">
            Links & Related Projects</a></span>
      <br>
      <span class="nbsep">|</span>
      <center>
        <p>
          <a href="http://www.efytimes.com/efytimes/24867/news.htm"><img src="images/FOSS_awards_low_res.jpg" alt="foss india awards icon" align="absbottom"></a>
        </p>
      </center>
      <span class="nbsep">|</span>
      <hr noshade>
<span>
<!-- SiteSearch Google -->
<center>

<form method="get" action="http://www.google.com/search" target="_top">
<table border="0" bgcolor="#ffffff">
<tr><td nowrap="nowrap" valign="top" align="left" height="32">
<a href="http://www.google.com/">
<img src="http://www.google.com/logos/Logo_25wht.gif" border="0" alt="Google" align="middle"></a>
<br/>
<input type="hidden" name="domains" value="http://www.harvestmanontheweb.com"></input>
<input type="text" name="q" size="20" maxlength="255" value="harvestman"></input>
</td></tr>
<tr>
<td nowrap="nowrap">
<table>
<tr>
<td>
<input type="radio" name="sitesearch" value="" checked="checked"></input>
<font size="-1" color="#000000">Web</font>
</td>
<td>
<input type="radio" name="sitesearch" value="http://www.harvestmanontheweb.com"></input>
<font size="-1" color="#000000">this site</font>
</td>
</tr>
</table>
<input type="submit" name="sa" value="Search"></input>
</td></tr></table>
</form>

<!-- SiteSearch Google -->          
</center>
</span>    
     </div>
   
   <div id="main-content">
     <p><span class="section-title"><b><u>Welcome</u></b></span></p>
     <p>Welcome to the project page of the HarvestMan web crawler.</p>
    <p><span class="section-title"><b><u>Companion Website <font color="red">(new)</font><b></b></u></b></span></p> 
    <p>HarvestMan has a new <a href="http://harvestman.everythingability.com">companion website</a>,  thanks to Tom Smith.
     The new site has more current information including a Wiki which is updated frequently.</p>
    <p><span class="section-title"><b><u>News <b>(Updated May 08 2008)</b></u></b></span></p> 
    <p>Read the latest <a href="news.html#latest">news</a> about HarvestMan.</p>
    <p><span class="section-title"><b><u>Development Code</b></u></b></span></p> 
    <p>Browse or download the <a href="current.html">bleeding edge</a> source code.</p>
     <p><span class="section-title"><b><u>About HarvestMan</u></b></span></p>
     <p>HarvestMan is a <a href="http://en.wikipedia.org/wiki/Web_crawler">web crawler application</a> written in the <a href="http://www.python.org">Python</a> programming language.
        HarvestMan can be used to download files from websites, according to a number of user-specified rules. The latest version of HarvestMan supports as much as 60 plus customization
        options. HarvestMan is a console (command-line) application.
     </p>
     <p>HarvestMan is the only open source, multithreaded web-crawler program written in the Python language. HarvestMan is released under the <a href="http://www.gnu.org/copyleft/gpl.html">GNU General Public License</a>.</p>
    <p><span class="section-title"><b><u>Current Release</u></b></span></p>
    <p>The latest release of HarvestMan is 1.4.6.
       <ul>
        <li>Read the <a href="files/Changelog.txt">Changelog</a> for this release</li>
        <li><a href="download.html#latest">Download</a> the files for this release</li>
       </ul>
    <p>More information is available on the <a href="releases.html">releases page</a>.
    </p>

    <p><span class="section-title"><b><u>Architecture</u></b></span></p>
    <p>See the <a href="architecture.html">architecture of HarvestMan</a>.</p>
    <p><span class="section-title"><b><u>HarvestMan Configuration</u></b></span></p>    
    <p>HarvestMan is typically run by reading options from a configuration file. The configuration file
    is in the XML format. By default it is named <i>config.xml</i>. This overrides an older text format, where configuration options were represented
    as name/value pairs in a text file. This <a href="configoptions.html">page</a> describes the older
    format in detail. 
    </p>
    <p>Here is a <a href="http://download.berlios.de/harvestman/config.xml">sample config file</a> of 
     HarvestMan.</p>
    <p><span class="section-title"><b><u>HarvestMan command-line options</u></b></span></p>
    <p>HarvestMan also accepts command-line options. The <a href="commandline.html">Command line FAQ</a>
    describes the most important command-line options for HarvestMan.
    </p>
    <p><span class="section-title"><b><u>Developers</u></b></span></p> 
    <p>The original developer of HarvestMan is <a href="http://randombytes.blogspot.com">Anand B Pillai</a>.
    Anand is a software professional, based in Bangalore, India.</a>.
    </p>
    <p><span class="section-title"><b><u>History</u></b></span></p> 
    <p>For an interesting article on the history of HarvestMan, read this <a href="http://developer.spikesource.com/wiki/index.php/An_Interview_with_the_author_of_harvestman">interview</a>.
    </p>

    <p><span class="section-title"><b><u>Downloads</u></b></span></p> 
    <p>Check the <a href="download.html">download page</a> for HarvestMan downloads.</p>
    <p><span class="section-title"><b><u>Contacts</u></b></span></p> 
    <p><a href="mailto:print 'nocvyynv@tznvy.pbz'.encode('rot13')">Email address</a>.</p>
    </p>
    
    </div>

</p>

<!---
<p>
<form method="get" action="http://groups.yahoo.com/subscribe/BangPypers">
<table cellspacing="0" cellpadding="2" border="0" bgcolor="#ffffcc" align="center">
  <tr>
    <td colspan="2" align="center">
      <em>Subscribe to BangPypers</em>
    </td>
  </tr>
  <tr>
    <td>
      <input type="text" name="user" value="enter email address" size="20">
    </td>
    <td>
      <input type="image" border="0" alt="Click here to join BangPypers" 
       name="Click here to join BangPypers"
       src="http://us.i1.yimg.com/us.yimg.com/i/yg/img/i/us/ui/join.gif">
    </td>
  </tr>
  <tr align="center">
    <td colspan="2">
      Powered by&nbsp;<a href="http://groups.yahoo.com/">groups.yahoo.com</a> 
    </td>
  </tr>
</table>
</form>
</p> 
-->
<br><br>
     <div id="footer">

      <br>&copy; Anand B Pillai<br>
      Last modified on Jun 16 2008<br>
      <br>
<br>
      <br><br><br><br>
      <p>   
      <table border="0" cellSpacing="10" cellPadding="0" align="center">
      <tr><td</td>
      <td><a href="http://www.python.org"><img src="images/py_powered.gif"></a></td>
      <td><img src="images/HarvestMan_s.jpg"></td>
      </tr>
      </table>
      </p>
</div>


  </body>

</body></html>
 
